Overview of the TREC 2011 Entity Track
نویسندگان
چکیده
The TREC Entity track aimed to build test collections to evaluate entity-oriented search on Web data. In 2011, the track worked with two corpora: the ClueWeb 2009 web corpus and the new Sindice-2011 dataset [2]. Motivated by observations from the 2010 Entity track, we made the following changes in the track setup for 2011. In the REF task, we focused on modifications to simplify the evaluation and to improve cross-system comparison: (1) only primary homepages are accepted, i.e., relevance is binary; (2) for each answer, a (single) supporting document is required; (3) target type is not limited anymore; (4) groups that generate results using Web Search Engines are required to submit an obligatory run, using the Lemur ClueWeb Online Query Service. The main change regarding the LOD task is the use of the Sindice-2011 corpus, an improved and larger Semantic Web crawl, replacing the BTC-2009 collection used in 2010: (1) the target corpus is a larger and more representative LOD crawl; (2) examples are not mapped manually to LOD, but given as ClueWeb document identifiers. Finally, we introduced a new pilot task, REF-LOD, to explore the differences between ranking web data and ranking semantic web data, possibly enabling deeper investigations into connecting Semantic Web data with the ‘real’ web. We basically repeated the REF task, but requested results identified by their LOD URIs instead of their homepages. One of the goals of this task was to gain more insights in entity representation, and specifically investigate how often entities that are not represented on the web with their own homepage are represented as entities in LOD. In the remainder of the paper we first detail the setup of each task. We present the results collected in this year’s track participation, and summarize the approaches applied. The paper concludes with a brief discussion of the problems faced by the track, and the way forward.
منابع مشابه
Purdue at TREC 2010 Entity Track: A Probabilistic Framework for Matching Types Between Candidate and Target Entities
This paper gives an overview of our work for the TREC 2010 Entity track. The goal of the TREC Entity track is to study entity-related searches on Web data, which has not been sufficiently addressed in prior research. For both the Related Entity Finding (REF) task and the Entity List Completion (ELC) task in this track, we propose a unified probabilistic framework by incorporating the matching b...
متن کاملOverview of the TREC 2011 Legal Track
The TREC 2011 Legal Track consisted of a single task: the learning task, which captured elements of both the TREC 2010 learning and interactive tasks. Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability. Participants were permitted...
متن کاملTongKey at Entity Track TREC 2011: Related Entity Finding
This paper presents our work done for the TREC 2011 Entity track. A retrieval model was proposed for the task of related entity finding. This model consists of several parts: In order to get more accurate document collection, query analysis method was utilized to format the narrative of each query. Then, our dataset was generated by using ClueWeb09 API 2 . Moreover, we employed the NER tools an...
متن کاملLIA-iSmart at the TREC 2011 Entity Track: Entity List Completion Using Contextual Unsupervised Scores for Candidate Entities Ranking
This paper describes our participation in the Entity List Completion (ELC) task at Entity track 2011. Our approach combined the work done for the Related Entity Finding 2010 task with some new criteria as the proximity or the similarity between a candidate answer with the correct answers given as examples or their cooccurrences.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011